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Abstract. We consider situations where data have been collected such 
that the sampling depends on the outcome of interest and possibly fur- 
ther covariates, as for instance in case-control studies. Graphical models 
represent assumptions about the conditional independencies among the 
variables. By including a node for the sampling indicator, assumptions 
about sampling processes can be made explicit. We demonstrate how 
to read off such graphs whether consistent estimation of the association 
between exposure and outcome is possible. Moreover, we give sufficient 
graphical conditions for testing and estimating the causal effect of ex- 
posure on outcome. The practical use is illustrated with a number of 
examples. 

Key words and phrases: Causal inference, coUapsibility, odds ratios, 
selection bias. 
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1. INTRODUCTION 

Nonrandom sampling poses a challenge for the 
statistical analysis especially of observational data. 
We focus here on the problem of outcome-dependent 
sampling, where the inclusion of a unit into the sam- 
ple depends, possibly in some indirect way, on the 
outcome of interest, and possibly on further vari- 
ables. The prime examples are case-control studies, 
which have been surrounded by a long controversy, 
but are now one of the most popular designs in ob- 
servational epidemiology (Breslow, 1996). Any ob- 
servational study based on volunteers is also poten- 
tially sampled depending on the outcome, as the 
willingness to participate can never be safely as- 
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sumed to be independent of the outcome of inter- 
est, for example, income. In other situations the 
outcome-dependent sampling may not be obvious, 
such as retrospective time-to-event studies (Wein- 
berg, Baird and Rowland, 1993). 

A superficial statistical analysis will typically be 
biased under nonrandom sampling. It is therefore 
important to investigate and understand the assump- 
tions and limitations underlying valid inference in 
such situations. Most approaches make very spe- 
cific parametric modeling assumptions, including as- 
sumptions about the selection mechanism, sometimes 
accompanied by a sensitivity analysis (see, e.g., Co- 
pas and Li, 1997, or McCuhagh, 2008). As an al- 
ternative, in this article we investigate the poten- 
tial of graphical models to address the problem of 
outcome-dependent sampling and restrict any as- 
sumptions to be nonparametric and only in terms of 
conditional independencies. A graphical model rep- 
resents variables as nodes and uses edges between 
nodes so that separations reflect conditional inde- 
pendencies in the underlying model (see, e.g., Whit- 
taker, 1990, or Lauritzen, 1996). 

A key element of our proposed approach is to in- 
clude a separate node as a binary selection indicator 
in the graph so as to represent structural assump- 
tions about how the sampling mechanism is related 
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to exposure, outcome, covariates and possibly hid- 
den variables. A similar idea appears in the works of 
Cooper (1995), Cox and Wermuth (1996), Geneletti, 
Richardson and Best (2009) and Lauritzen and 
Richardson (2008). Our main results in this arti- 
cle address the (graphical) characterization of situ- 
ations where the typical null hypothesis of no asso- 
ciation or no causal effect can be tested, and, when 
all variables are categorical, where (causal) odds ra- 
tios can be consistently estimated. These graphi- 
cal rules do not require particular parametric con- 
straints and essentially capture when the model is 
collapsible over the selection indicator. While our re- 
sults on testing are general, estimation is restricted 
to odds ratios as these (or functions thereof) are the 
only measures of association that do not depend on 
the marginal distributions (Edwards, 1963; Altham, 
1970) and for which results can be obtained without 
specific parametric assumptions. 

The outline of the article is as follows. In Sec- 
tion 2 we review basic concepts of graphical models, 
and highlight how a binary sampling node can be 
included so as to make assumptions about the sam- 
pling process explicit (Section 2.4). The correspond- 
ing graphs can be constructed in different ways, for 
instance in a prospective or retrospective manner 
representing different types of assumptions. Section 
3 revisits the notion of collapsibility, which is funda- 
mental for being able to 'ignore' outcome-dependent 
sampling. Sections 3.3 and 3.4 provide the central re- 
sults that allow us to estimate an odds ratio (Corol- 
lary 4) or test for an association (Theorem 6) under 
outcome-dependent sampling. Section 3.5 illustrates 
this with a data example. We then move on to causal 
inference in Section 4, where we define a causal effect 
as the effect of an intervention. We formalize this us- 
ing intervention indicators, and adapt the graphical 
representation by adding a corresponding decision 
node yielding so-called influence diagrams (Dawid, 
2002). In analogy to the associational case. Theorem 
7 establishes (graphically verifiable) conditions un- 
der which a prospective causal effect can be tested 
or estimated from retrospective, that is, outcome- 
dependent, data. In Section 5 we present new re- 
sults that apply to less obvious cases of outcome- 
dependent sampling (Theorems 8 and 9). 

2. GRAPHICAL MODELS 

We start with a brief overview of graphical mod- 
els. Our notation follows closely that of Lauritzen 
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Fig. 1. Examples of undirected graphs. 



(1996). A graph G = {V,E) with nodes or vertices 
V and edges E is combined with a statistical model, 
that is, a distribution P on a set of variables that are 
identified with the nodes of the graph. The distribu- 
tion P has to be such that the absence of an edge in 
the graph represents a certain conditional indepen- 
dence between the corresponding pair of variables. 
Conditional independence of A and B given C is 
denoted by ^ ±L B\C (Dawid, 1979). 

The type of graph dictates the specific conditional 
independence induced by the absence of an edge. 
A basic distinction is between undirected graphs, 
directed acyclic graphs (DAGs) and chain graphs. 
We review the former two but do not go into detail 
for chain graphs. 

2.1 Undirected Graphs 

In an undirected graph G all edges are undirected, 
represented by a-b. All nodes in y\{a} that have an 
edge with a (zV form the boundary bd(a) of a. We 
say that two sets of nodes A and B are separated 
by a third set C if any path along the edges of the 
graph between A and B includes vertices in G. In 
particular, each set A is separated by its boundary 
from all other nodes y\(bd(A) L)A). The induced 
conditional independencies are as follows. For any 
disjoint subsets A,B,G C V, the variables in A have 
to be conditionally independent of those in B given 
G whenever G separates A and B in the graph — 
we then call P G-Markovian. As examples consider 
the undirected graphs in Figure 1. In the left graph, 
X _LL C|(-B, y) as well as Y AL B\{G,X). In the right 
graph of Figure 1, for instance, Bi _LL B2\{G,Y) or 
Bi IL B2\{G,X) showing that separating sets are 
not necessarily unique. 

We say that a subset G C V oi the nodes of a 
graph is complete if each pair of nodes in G is joined 
by an edge. We further call such a complete G a 
clique if adding any further node would destroy its 
completeness; that is, a clique is a maximal complete 
set of nodes. In Figure 1 the graph on the right has 
cliques {Bi,G},{Bi,X},{X,Y} and {B2,G,Y}. 

Let C be the set of all cliques in G. The above con- 
ditional independence restrictions that are induced 
by an undirected graph go hand in hand with a fac- 
torization of the joint distribution in terms of these 
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cliques. Assume p is the p.d.f. (or p.m.f. if discrete) 
of the random vector Xy = {Xi , . . . , Xk) taking val- 
ues XV = {xi, . . . , xk); then p is said to factorize ac- 
cording to an undirected graph G if it can be written 
as 

(1) P{xv) = n ^c{xc), 

cec 

where (pd') are functions that depend on xy only 
through its components in C. 

In later sections we will also make use of the no- 
tion of an induced subgraph Ga, AcV, obtained by 
removing all nodes in V\A and edges involving at 
least one node in V\A. 

2.2 Directed Acyclic Graphs 

Directed acyclic graphs (DAGs) represent differ- 
ent sets of conditional independencies than undi- 
rected graphs. They often seem more natural if one 
thinks of a data generating process, that is, a way 
in which the data could be simulated, but DAGs 
can also be used to represent conditional indepen- 
dencies other than for a generating process. More- 
over, one could also use chain graphs (Frydenberg, 
1990; Wermuth and Lauritzen, 1990) which repre- 
sent again different sets of conditional independen- 
cies; undirected graphs as well as DAGs are special 
cases of chain graphs. 

In DAGs all edges are directed, for example, a 
b, without forming any directed cycles. For a ^ b 
we say that a is a parent of b and 6 is a child of a. 
This can be generalized to sets, for example, pa(A), 
AcV, denotes all the variables in V\A that are par- 
ents of some variable in A, and similarly for the chil- 
dren ch{A). Analogously we speak of descendants 
de(^) of A, meaning all those nodes in V\A that 
can be reached from some vertex in A following the 
direction of the edges, while nondescendants nd{A) 
are all other nodes (excluding A itself). Further, the 
ancestors a,ii{A) are defined as those nodes in V\A 
from which we can reach some vertex in A following 
the direction of the edges. 




Fig. 2. DAG (left) representing A IL B and moral graph 
(right) showing A JAL B\C . 



Similarly to (1), a DAG also induces a factoriza- 
tion of the joint density as follows: 

K 

(2) p{xi,...,XK) = Y{p{xi\Xpa.{i)), 

k=l 

where p(3;j|xpa(i)) denotes the conditional density of 
Xi given all its parent variables Xpa(i) . A simple ex- 
ample is given in Figure 2 (left): here the joint dis- 
tribution factorizes as p{a,b,c) = p{a)p{b)p(^c\a,b) . 

The factorization (2) is equivalent to the follow- 
ing, graphically characterized conditional indepen- 
dencies: 

(3) Xi _LL ^nd(i)\pa(i) l-'^pa(i) Vi G V. 

For instance, in Figure 2 (left), A is a nondescendant 
of B that has no parents, hence A IL B is implied 
by this DAG; there is no other (conditional) inde- 
pendence in this particular DAG. 

Even though the partial ordering imposed by the 
direction of the edges on the variables Xi , . . . , Xk 
is often postulated to follow some causal or time 
order, this does not automatically follow from the 
represented conditional independencies (cf. Section 
2.4). For instance p(a, 6, c) = p{a)p{c\a)p{b\c) implies 
the same conditional independencies as p{a,b,c) = 
p{b)p{c\b)p{a\c) , represented in the two graphs A — >■ 
G B and A-^ G B, respectively. In both cases 
A1LB\G. These two graphs (or factorizations) are 
called Markov equivalent, meaning that exactly the 
same conditional independencies can be read off. 
This implies that even if we believe there is an un- 
derlying unknown causal or other kind of ordering, 
then conditional independencies estimated from ob- 
servational data on A, B, G cannot help to distin- 
guish between these graphs nor tell us the causal 
order. However, depending on the way in which the 
variables are observed, how a study is conducted or 
other considerations, it might seem more natural to 
specify p{c\a) and p{b\c) than p{c\b) and p{a\c) (we 
will come back to this below). Note that the DAG 
of Figure 2 is not equivalent to the former two as it 
induces a different independence, A IL B. 

2.3 Selection Effect and Moralization 

All conditional independencies that can be de- 
duced from (3) are given by graph separation for 
DAGs. One can either use the d-separation criterion 
(Verma and Pearl, 1988) or the moralization crite- 
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rion (Lauritzen et al., 1990). The latter is described 
in detail next.^ 

The moralization criterion is used to determine 
the conditional independencies of a DAG G in a 
collection of undirected graphs using ordinary graph 
separation. These are the moral graphs on subgraphs 
of G. Let ^ C y and let An(^) = an(A) U A; then 
the corresponding moral graph G'^^^j^-^ is given by 
adding undirected edges between nodes of An (A) 
that have a common child in An(A) and then turn- 
ing all remaining directed edges into undirected ones. 
Any conditional independence that is induced by 
factorization (2) can be read off G^^^^-j for some 
A C V. More specifically, if we want to establish 
whether A _LL B\C, then we check for graph sepa- 
ration in the undirected graph G^^^^^^j^^^^ . In Fig- 
ure 2, if we want to investigate whether A IL B, we 
draw the moral graph G'^^^^^j^^ which consists of 
the two unconnected nodes A and B, with node C 
removed as it is not in An(^ U B), confirming that 
A IL B as shown earlier. However, if we want to 
check whether A IL B\G, then we draw the moral 
graph G^^^^^JQ^JU■J shown on the right in Figure 2 
and see that this independence is not implied by the 
graphical model. 

The "moral edges" represent what is known in epi- 
demiology as selection or stratification (Greenland, 
2003; Hernan, Hernandez-Diaz and Robins, 2004): if 
A and B are marginally independent but C depends 
on both of them, as represented by the DAG in Fig- 
ure 2, then conditioning on C typically induces a de- 
pendence between A and B. This can easily be seen 
as the factorization p{a, b, c) = p{a)p{b)p{c\a, b) does 
not imply a factorization of p(a,b\c) = p{a\c)p{b\c) . 
As an example assume that A is (binary) exposure 
to a risk factor and B is some disease indicator that 
is entirely unrelated to A. Further assume that the 
data are obtained from a database C, with C = 1 if 
an individual is found in that database and C = 
otherwise. If, for some reason, individuals who are 
exposed are more likely to be in the database as well 
as individuals who are ill, then we typically find an 
association between A and B in the sample from 
that database because we condition on C = 1. This 
may, for example, happen when it is a database for 
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^The two criteria, d-separation and moralization, are en- 
tirely equivalent and readers more familiar with the former 
can verify all conditional independencies in the following with 
d-separation. 
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Fig. 3. DAG (left) with moral graphs on An{B U X) (mid- 
dle) and on An{BuXuS) (right). 

a different disease for which A is a risk factor and 
which is associated with B (cf. Berkson, 1946). In 
this example, the marginal association of A and B is 
the target of inference, but cannot be obtained from 
the available data, so that this phenomenon is often 
called selection (or stratification) bias. Note that in 
econometrics the term selection bias is also used to 
denote systematic (as opposed to randomized) selec- 
tion into treatment or exposure (Heckman, 1979), 
which in epidemiology would rather be called con- 
founding. 

The selection effect is equally relevant when con- 
ditioning is not on a common child of nodes A and 
B but on any descendant of such a common child 
as this indirectly provides information on all ances- 
tors; for example, in Figure 3 (left), X IL B but 
X ^ B\S because S carries some information on 
Y and y is a common child of B and X. Figure 
3 shows the corresponding moral graphs G^^^^^j-^^ 
(middle) and G^^^^^^^^^ (right) for checking these 
two conditional independence statements. 

2.4 Graphical Representation of Sampling 
Mechanisms 

We now turn to the question of how to represent 
with graphical models that the units in the dataset 
have possibly been sampled depending on some of 
the variables relevant to the analysis. The nodes in- 
clude X, the exposure or treatment, and Y, the re- 
sponse. Additional nodes are used to represent fur- 
ther relevant variables, for example, in particular a 
binary variable S indicating whether the unit is sam- 
pled, 5 = 1, or not, 5 = 0. This use of a sampling 
indicator has also been proposed by Cooper (1995), 
Cox and Wermuth (1996), Geneletti, Richardson and 
Best (2009) and Lauritzen and Richardson (2008). 
Also, we may include a set of covariates C. 

The graph is normally constructed based on a 
combination of subject matter background knowl- 
edge, especially concerning the sampling mechanism, 
and testable implications. Some examples for differ- 
ent sampling mechanisms are depicted in Figure 4. 
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(c) 
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Fig. 4. DAGs for different sampling processes: (a) inde- 
pendent sampling, (b) stratified (on covariates C) sampling, 
(c) case-control sampling, (d) case-control matched by C . 





Fig. 5. Representing the sampling order: DAG (left) with 
moral graph on An{B UXUCUY) (right) . 



Note that all the graphs in Figure 4 could as well 
be undirected (replacing every directed edge by an 
undirected one) and still represent the same con- 
ditional independencies, that is, S _LL (X, Y, C) in 
(a), SAL{X,Y)\C in (b), SAL{X,C)\Y in (c) and 
SALX\{Y,C) in (d). 

For a given set of variables we speak of outcome- 
dependent sampling if Y and S are dependent what- 
ever subset of the remaining variables we condi- 
tion on, such as in (c) and (d) of Figure 4. The 
problem created by outcome-dependent sampling is 
that, first, p{y) cannot be identified as the obser- 
vations are only informative for p{y\S = 1), and sec- 
ond, that conditioning on 5 = 1 might create associ- 
ations that are not present in the target population 
due to the selection effect as explained in Section 2.3 
(Hernan, Hernandez-Diaz and Robins, 2004). For a 
DAG to represent the selection effect, it has to be 
constructed in a prospective way, as in Figure 4 and 
as illustrated in the following example. 

Example. In Figure 3, assume that B and C are 
baseline covariates like B = sex and C = age, while 
X is exposure to a risk factor (e.g., loud music) that 
possibly changes with age but not with sex and Y 
is a disease (e.g., hearing loss) that is affected by 
all previous variables. Assume further that we se- 
lect individuals into our study, S = 1, based on age 
and disease status (as would be the case in a case- 
control study matched by age); hence S depends 
on C and Y. A DAG on the partially ordered vari- 
ables {{B,C},X,Y,S) would reflect the time order 
in which the variables are assumed to be realized, 
and would enable us to express, for instance, the as- 
sumption that B IL X as shown in Figure 3. It also 
allows us to represent that the sampling induces de- 
pendencies that are not present marginally, that is, 
the selection effect. The DAG in Figure 3, for in- 
stance, implies that the joint density of all variables 
factorizes as 

p{s,y,b,c,x) 

= P{s\y, c)p{y\x, b, c)p{x\c)p{b)p{c). 



Data from a case-control study, however, only ad- 
mits inference on the conditional distribution given 
S = 1 which is given by 

p{y,b,c,x\S = 1) 

(4) 

^ p{S = l\y, c)p{y\x, b, c)p{x\c)p{b) 

Y.y,cP(^ = My^c)p{c)p(.y\c) 

Marginalizing this over Y shows that there are no 
necessary independencies among {X,B,C} condi- 
tional on S" = 1 confirming that {X,B,C} must be 
complete in G"^^^^ b c S} ™ right graph of 
Figure 3. 

As an alternative to the prospective view, one 
could decide to represent the sampling process, that 
is, the order imposed by the sampling which will be 
retrospective under outcome-dependent sampling; in 
a case- control study, for instance, the response Y is 
sampled first and hence the remaining variables are 
conditional on the response. 

Example continued. Continuing the above ex- 
ample, choosing the sampled units {S = 1) based 
on age and disease status partially reverses the or- 
der to be {S,{C,Y},{B,X}). Conditional indepen- 
dence test on the retrospective data might reveal 
that X _LL B\(C,Y) which can be represented as 
in the DAG in Figure 5 (cf. moral graph on the 
right). While this conditional independence can be 
tested from case-control data, the marginal inde- 
pendence B IL X postulated in Figure 3 cannot be 
tested from case-control data due to the properties 
of (4). (The latter could, however, be checked ap- 
proximately, when the disease is rare, using only the 
controls.) 

A DAG reflecting the sampling order allows us 
to encode conditional independencies given that a 
unit is sampled. While this makes it typically more 
difficult to include any subject-specific background 
knowledge about the data generating process in the 
formulation of the graph, it might result in a model 
that fits the data better and still provides consistent. 
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and simpler, estimates of the parameter we are in- 
terested in. For a causal analysis, however, where we 
want to be able to express prior causal assumptions, 
it is crucial that the graph reflects the prospective 
view (see Section 4.1). In particular we want to en- 
code which variables are potentially affected by an 
external change in the exposure status X by rep- 
resenting these as descendants of X in the graph. 
The causal interpretation of DAGs will be discussed 
in more detail in Section 4; until then we focus on 
graphs representing conditional independencies. 

3. COLLAPSIBILITY 

Roughly speaking, coUapsibility means that infer- 
ence can be carried out on a subset of the variables, 
that is, after marginalizing over others (an exact def- 
inition for odds ratios is given below). Typically col- 
lapsibility is exploited to reduce dimensionality and 
computational effort, as it enables us to pool sub- 
groups. In our context coUapsibility is relevant for 
the opposite reason: all that we have is a subgroup, 
namely the sampled population; but we want the 
associations found in this subgroup to be valid for 
the whole target population. 

In this section we focus on the odds ratio as a 
measure of association. This is motivated by the 
fact that the odds ratio does not depend on the 
marginal distribution of Y , which is potentially af- 
fected by the sampling process, this also being the 
main reason why odds ratios are typically used for 
case-control data. We revisit general results on col- 
lapsibility of odds ratios, including their graphical 
versions, and then modify these to deal with the 
particular problem of outcome-dependent sampling. 
Note that the results concerning odds ratios given 
next are closely linked to graphical log-linear models 
(Darroch et al., 1980; Lauritzen, 1996, Chapter 4). 

3.1 Conditional Odds Ratios 

Define the conditional odds ratio ORyx{C = c) 
(in short we also write ORyx{C)) for binary Y and 
binary X given C = c as 

p{Y = l\X = l,C = c)p{Y = 0|X = 0, C = c) 
p{Y = 0|X = 1, C = c)p{Y = 1|X = 0, C = c) ' 

It is straightforward to generalize this for the case 
where Y and X have more than two categories. 
We might then consider a collection of (conditional) 
odds ratios comparing the probabilities for Y = y 
versus a reference category y = for values of X, 



say X = X versus X = 0. This collection of odds ra- 
tios fully characterizes the (conditional) dependence 
between X and Y and is, vice versa, fully deter- 
mined by the corresponding interaction terms of a 
log-linear model. The results given below can there- 
fore easily be extended to the case of more than two 
categories. 

3.2 General Results 

It is well known that the conditional odds ratio 
ORyx{C) is not necessarily the same as when we 
collapse over C, that is, as the marginal ORyx, 
even if ORyxiC = c) = ORyx{C = c') for ah c ^ c'. 
Though this property is at the heart of most def- 
initions of coUapsibility, there are some subtleties 
giving rise to various definitions of, and different 
conditions that are sufficient and sometimes neces- 
sary for, coUapsibility (Bishop, Fienberg and Hol- 
land, 1975; Whittemore, 1978; Shapiro, 1982; Davis, 
1986; Ducharme and Lepage, 1986; Wermuth, 1987; 
Geng, 1992; Guo, Geng and Fung, 2001). Here we 
define coUapsibility as follows. 

Definition 1. Consider two binary variables X 
and Y and disjoint sets of further variables B and 
C. We say that the odds ratio ORyx{B,C) given 
B and C is collapsible over B if ORyx{B = b,C = 
c) = ORyx{B = b',C = c)= ORyx{C = c), for all 

Note that in the above definition as well as in all 
the following results, the covariates C can be of arbi- 
trary measurement level, while X, Y and the covari- 
ates we consider to collapse over, B, are categorical. 
CoUapsibility can then be ensured as follows. 

Theorem 2. Sufficient conditions for the condi- 
tional odds ratio ORyx{B,C) to be collapsible over 
B are: 

(i) Y ALB\{C,X) or 

(ii) X1LB\{C,Y). 

Proof. This follows from the work of Whitte- 
more (1978). □ 

Remarks, (a) The conditions in Theorem 2 are 
necessary if is a single binary variable (Whitte- 
more, 1978). 

(b) The conditions in Theorem 2 also ensure, and 
are necessary for, strong coUapsibility which posits 
that the equality holds for any newly defined B' ob- 
tained by merging categories of B (Ducharme and 
Lepage, 1986; Davis, 1986). 



GRAPHICAL MODELS FOR OUTCOME DEPENDENT SAMPLING 



7 



The conditions in Theorem 2 are not necessary; 
the following corollary gives a more general result. 

Corollary 3. Assume that B can he partitioned 
into {Bi,...,Bk), and let = {Bk+i, . . . , Bk) ■ 

If Bk satisfies for each k = 1, . . . ,K , either: 

(i) Y ALBk\{C,X,B^+^) or 

(ii) X^i?fc|(C,y,5^+i), 

then ORyx{B ,C) is collapsible over B. 

Proof. With Theorem 2, ORyxiBk, . . . ,Bk,C) 
is collapsible over i?^, k = 1, . . . , K . Hence we can 
consecutively collapse over Bi,. . . , Bk ■ □ 

The conditions of Theorem 2, generalized in Corol- 
lary 3, can be checked graphically as they corre- 
spond to simple separations in graphical models, re- 
gardless whether an undirected graph, a DAG or a 
chain graph is used to model the data. We consider 
the case of undirected graphs next; these could also 
be the moral graphs derived from DAGs or chain 
graphs. 

Example. When B is not partitioned, the graph- 
ical equivalents of the two conditions in the above 
corollary are given in Figure 6, where each of the 
graphs could have fewer edges but not more. An ex- 
ample where B consists oi B = {Bi,B2} and is col- 
lapsible is given in Figure 1 (right); Bi satisfies (i) 
and B2 satisfies (ii) of Corollary 3. The conditions of 
Theorem 2 can easily be checked on DAGs as well. 
For instance, in Figure 7, the graph in (a) satisfies 
(i) and (b) satisfies (ii) of the theorem (their moral 
graphs are exactly those in Fig. 6). In contrast, (c) 
cannot be collapsed over B as the moral graph in (d) 
shows that neither B 1LY\{X,C) nor BALX\{Y,C) 
holds in general. Note that the marginal indepen- 
dence X IL B in this DAG does not help with re- 
spect to collapsing the odds ratio over B. 

Theorem 2 implies that even if we ignore B we can 
still obtain consistent estimates for the conditional 
odds ratio. However, it does not ensure that the ac- 
tual value of the ML-estimate of ORyx{C) is the 
same in the model where B is ignored as opposed to 



X Y 



CC B 

\W 

X Y 



B >C C >B a B C -B 

i it t iXi iXl 

X >Y X >Y X >Y X Y 



(a) 



(b) 



(d) 



Fig. 6. Left graph satisfies (i), right graph satisfies (ii) of 
Corollary 3. 



Fig. 7. Graphs (a) and (b) satisfy Theorem 2, while graph 
(c) violates the conditions [moral graph m {d)J. 



when B is included; this is another type of collapsi- 
bility (cf. Asmussen and Edwards, 1983, Lauritzen, 
1982, and the discussion by Kreiner, 1987; for DAGs 
see Kim and Kim, 2006 and Xie and Ceng, 2009; for 
chain graphs Didelez and Edwards, 2004). 

3.3 Collapsibility Under Outcome-Dependent 
Sampling 

Now, we investigate collapsibility over 5 because 
in the available data we have S = 1, so all we can es- 
timate is necessarily conditional on 5 = 1. Hence, we 
want to ensure that our estimate for ORyx{C, S = 
1) applies to the whole target population, that is, is 
consistent for ORyxiC). 

Corollary 4. The conditional odds ratio 
ORyx{C, S) is collapsible over S if and only ifYlL 
S\iC,X) or X ALS\iC,Y). 

Proof. This follows from Theorem 2 and note 
(a) (see Whittemore, 1978). □ 

Similarly to Geneletti, Richardson and Best (2009), 
we can call a set of variables C satisfying Corol- 
lary 4 bias-breaking because it allows us to estimate 
ORyx{C) consistently. As addressed in Section 2.3, 
conditioning on 5 when no such C can be found and 
S depends on both X and Y will typically induce 
an association even when there is no association be- 
tween X and Y , marginally or conditionally on co- 
variates; and if there is an association between X 
and Y, then conditioning on S will typically change 
this association so that estimates based on the se- 
lected data may be biased. 

Example continued. In both DAGs, Figures 
3 and 5, we can collapse ORyx{B,C, S) over S as 
X IL S\{B,C,Y). We can also collapse, in both 
graphs, ORyx{C,S) over S as X AL S\{C,Y). 

A consequence of Corollary 4 is that in a typical 
matched case-control study, the exposure-response 
odds ratio is only collapsible over the sampling if we 
condition on the matching variables, even if these 
are not marginally associated with exposure. This 
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B >S a S B^==Z~ — ~S B^=Z— ~S B-^ 

^^l l^i /</?^ 



Fig. 8. DAG for a matched (on B) case-control study (left) 
and moral graph (right). 

is illustrated in Figure 8, where sampling depends 
on the outcome Y as well as on a matching variable 
B, while X IL B. Here, we cannot collapse over S 
if B is ignored, as neither X _LL S\Y (as can be seen 
from the moral graph, where the common child Y 
induces an additional edge between X and B open- 
ing a path to S) nor obviously Y _LL S\X . However, 
X AL S\{B,Y) so that the odds ratio is collapsible 
over S if it is conditional on B. 

In addition to collapsing over S, we might want 
to reduce dimensionality of covariates, for exam- 
ple, to improve stability of estimates of odds ra- 
tios (Robinson and Jewell, 1991). This is possible 
if the set of covariates can be written as {B, C) 
such that ORyx{B,C, S) is collapsible over S and 
i? in a way such that an estimate for ORyx{C, S = 
1) is consistent for ORyx{B,C). The following is 
straightforward from Theorem 2 and assumes that 
there is outcome-dependent sampling, that is, Y ± 
JC S\{B,C,X) so that, unlike Corollary 4, the next 
corollary is not symmetric in X and Y . 

Corollary 5. The odds ratio ORyx{B,C,S) 
is collapsible over S, over {B, S) and over B if X 1. 
^S\{Y,B,C) and: 

(i) X ALB\{Y,C) or 

(ii) Y 1LB\{X,C) and X 1LS\{Y,C). 

Proof. First note that X _LL S\{Y,B,C) yields 
ORyx{B,C,S) collapsible over S. For part (i) addi- 
tionally, X ALB\{Y,C) yields ORyx{B,C) collapsi- 
ble over B by Theorem 2. Both conditional indepen- 
dencies together imply that X _LL S\{Y, C) which fi- 
nally yields ORYxiC, S) collapsible over 5, so that 
all these are equal to ORyx {C) . For part (ii) we see 
that ORyx{B ,C) is collapsible over B due to F _LL 
B\{X,C) and we further have that ORyx{C, S) is 
collapsible over S due to X IL S\{Y,C). This yields 
the desired result. □ 

The conditions of Corollary 5 can again be checked 
on graphical models by corresponding separations; 
see Figure 9. As before, if B can be appropriately 
partitioned. Corollary 5 can be applied successively 
to the subsets B^. When DAGs are used to check 



Fig. 9. Illustrations of Corollary 5: left graph satisfies 
X IL S\{Y, B,C) , middle graph satisfies condition (i) and 
right graph satisfies condition (ii). 

Corollary 5, then the moral graph(s) have to be 
identical to or have fewer edges than those in Figure 
9. The DAGs in Figures 3 and 5 serve as examples 
for the prospective and retrospective approaches, re- 
spectively. From both we infer that we can collapse 
over S, but only the second one also satisfies (i) of 
Corollary 5. In contrast it can be seen from Fig- 
ure 3 that the conditions (i) and (ii) of Corollary 5 
will not be satisfied in a DAG that represents the 
prospective view and where B as well as X point at 
Y. 

3.4 Testability of the Null Hypothesis 

In general data situations, for example, when Y is 
continuous, the odds ratio is not necessarily an ap- 
propriate measure of dependence. Other measures 
are typically not identified under outcome-dependent 
sampling without further assumptions. However, a 
result analogous to Theorem 2 can still be obtained 
if we restrict ourselves to the question whether the 
(conditional) independence of X and Y, possibly 
given covariates C, can still be tested under outcome- 
dependent sampling. This conditional independence 
is often the null hypothesis of interest. 

Theorem 6. IfS 1LY\{C,X) or S IL X\{C,Y), 
then 

Y1LX\C Y 1LX\{C,S = 1). 

Proof. By the properties of conditional inde- 
pendence (Dawid, 1979) we have that y _LL X|C to- 
gether with S 1LY\{C,X) (or S AL X\{C,Y)) im- 
mediately implies Y IL X\{C,S). Now assume that 
Y 1LX\{C,S = 1) and S IL Y\{C, X); then p{y\x, c) = 

EsP(yl'S>3;,c)p(s|x,c) = Y.sPiy\s = i,c)p(s|x,c) 

which is just p{y\S = 1, c). If instead S AL X\{C, Y), 
an analogous argument yields p{x\y,c) = p{x\S = 
l,c). This completes the proof. □ 

Hence, under the assumptions of Theorem 6 we 
can test the null hypothesis of no (conditional) as- 
sociation between exposure and response even un- 
der outcome-dependent sampling by any appropri- 
ate test for Y AL X\{C,S = 1). Note that S has to 
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satisfy the same conditions as for collapsibility of the 
odds ratio in Theorem 2; this can be explained by 
the one-to-one relation between (conditional) inde- 
pendence and vanishing mixed derivative measures 
of interaction (Whittaker, 1990, page 35) which are 
a generalization of odds ratios to continuous distri- 
butions. 

3.5 Application: Hormone Replacement Therapy 
(HRT) and Transient Cerebral Ischemia 
(TCI) 

We illustrate the above with a simplified version 
of the analysis of Pedersen et al. (1997). The data 
are from a case-control study, where the disease of 
interest is transient cerebral ischemia ( TCI) and the 
main risk factor is use of hormone replacement ther- 
apy (HRT). Controls are matched by age. Further, 
smoking status (Smo), occupation (Occ) and his- 
tory of other thromboembolic disorders (THist) are 
included. All covariates here are categorical; in par- 
ticular HRT is measured with categories "never" 
(the reference category), "former," "oestrogen" and 
"combined." The target of inference is the TCI- 
HRT odds ratio conditional on all covariates. Un- 
der what additional assumptions this can be given a 
causal interpretation will be addressed explicitly in 
the next section. 

Assume the conditional independencies represented 
in the DAG in Figure 10. The additional knowledge 
that the actual study design is case-control matched 
by age is easily included by drawing arrows from Age 
and TCI into the additional node S, as in Figure 11. 
Note that the assumptions implied by the subgraph 
on the covariates are supported by the data from 
the controls only and extrapolated to hold for the 
whole population. 

The moral graph on all variables is shown in Fig- 
ure 12. This represents the conditional independence 
structure that we would expect to see in the data 
(i.e., conditional on S = 1). Note that a particular 
feature of the age matching is that TCI _LL Age\S = 
1 but as this is not necessarily the case for 5 = we 





Fig. 11. DAG as in Figure 10 but now including selection 
node S to reflect matched case-control sampling. 

have to leave the edge TCI- Age in the moral graph. 
(It is clear that the TCI-Age odds ratio cannot be 
estimated from case-control data matched by age; 
formally, it is not collapsible over S.) 

It is obvious from the moral graph as well as from 
the design of the study that, given TCI and Age, all 
other variables are independent of the sampling in- 
dicator S. In particular HRT IL S\{TCI , Age, Occ, 
Smo, THist) so that Corollary 4 is satisfied, mean- 
ing that the conditional TCI-HRT odds ratio (given 
all covariates) based on the selected sample is con- 
sistent for the one in the population. In addition 
we see that B = Occ is independent of TCI given 
Age, Smo, THist and HRT so that condition (ii) of 
Corollary 5 holds and we can ignore the occupation 
of a person when estimating the conditional odds 
ratio between TCI and HRT. 

The actual calculation of the desired odds ratio 
can be carried out by fitting a log-linear model on 
the subgraph of Figure 12 excluding the selection 
node S (Darroch, Lauritzen and Speed, 1980; Lau- 
ritzen, 1996, Chapter 4). The desired conditional 
TCI-HRT odds ratio is a function of the interac- 
tion parameters in this model. For this dataset, we 
obtain the log (conditional) odds ratios given in Ta- 
ble 1 (there is no evidence that these are different in 
the subgroups defined by the conditioning variables 
Age, THist, Smo). 

Earlier we assumed that the conditional TCI- 
HRT odds ratio given all other covariates is the tar- 
get of interest. If for some reason instead one wants 

■ S 




Fig. 10. Prospective model for HRT -TCI example. 



Fig. 12. Moral graph on all nodes of Figure 11. 
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Table 1 

Conditional odds ratio between HRT and TCI given {Age, 
THist, Smo} based on separations in the DAG of Figure 11 



HRT level 


log-OR (stdev) 


OR 


Never 


Reference 




Former 


0.64 (0.16) 


1.90 


Oestrogen 


0.73 (0.21) 


2.07 


Combined 


0.26 (0.19) 


1.29 



to condition only on a subset of {Occ, Smo, THist, 
Age}, this is still collapsible over S as long as Age 
is included in that subset. 

An alternative approach is to use a DAG that 
factorizes retrospectively according to the sampling 
process so that S (and hence TCI and Age) are the 
initial variables, taking into account that observa- 
tions are conditional on being sampled in the first 
place. Assume the conditional independencies repre- 
sented in the DAG in Figure 13 which is supported 
by the data. Collapsibility over S (given the covari- 
ates) is of course still satisfied as this is implied by 
the design and still reflected in the model assump- 
tions encoded by the graph. 

From the moral graph in Figure 14 we now have 
that HRT is conditionally independent of THist 
given the remaining variables so that condition (i) 
of Corollary 5 is satisfied with B = THist. The con- 
ditional TCI-HRT odds ratio given Occ, Smo, Age 
estimated from the case-control data is now consis- 
tent for the desired odds ratio in the target popula- 
tion. The results are similar to the first model as can 
be seen from Table 2. They are not exactly the same 
as the model assumptions of Figures 11 and 13 are 
indeed different, but they are both consistent, under 
their respective model, for the same odds ratio given 
all covariates. 

A standard analysis based on a logistic regres- 
sion of TCI on explanatory variables Age, Occ, Smo, 
THist, HRT implicitly assumes the model in Figure 
15, that is, all covariates are parents of TCI. While 




Fig. 13. DAC reflecting the sampling process for the 
HRT -TCI example. 




Fig. 14. Moral graph for Figure 13. 



Table 2 

Conditional odds ratio between HRT and TCI given {Age, 
Occ, Smo} based on separations m the DAC of Figure 13 



HRT level 


log-OR (stdev) 


OR 


Never 


Reference 




Former 


0.66 (0.16) 


1.93 


Oestrogen 


0.74 (0.21) 


2.10 


Combined 


0.28 (0.19) 


1.32 



the logistic regression does not make assumptions 
about the relations between the covariates, we have 
drawn the graph assuming they are mutually in- 
dependent. This is to demonstrate that the moral 
graph, given in Figure 16, in any case has all covari- 
ates forming a complete subgraph, that is, there are 
no conditional independencies given TCI . The re- 
sults can therefore be different from the above anal- 
yses, as conditional independencies involving the co- 
variates cannot be exploited to collapse over either 
Occ or THist. Adjusting for more covariates than 
necessary can lead to larger standard errors in logis- 




FlG. 16. Moral graph for Figure 15. 
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Table 3 

Conditional odds ratio between HRT and TCI given {Age, 
Dec, Smo, THist} from a logistic regression 



HRT level 


log-OR (stdev) 


OR 


Never 


Reference 




Former 


0.66 (0.16) 


1.93 


Oestrogen 


0.76 (0.21) 


2.14 


Combined 


0.28 (0.19) 


1.32 



tic regressions (see Robinson and Jewell, 1991), but 
this happens not to be the case here; see Table 3. 

In a more realistic analysis there will be more vari- 
ables to be taken into account, such as menopause, 
other medical conditions (hypertension, diabetes, 
heart diseases) and body mass index (see Peder- 
sen et al., 1997), so that logistic regression produces 
larger standard errors. The graphical approach based 
on Corollary 5 can help to reduce the set of covari- 
ates to be adjusted for. 

4. CAUSAL EFFECTS OF INTERVENTIONS 

So far we have regarded the conditional odds ratio 
given all covariates as the target measure of associa- 
tion between X and Y . However, in many situations 
one is interested in the causal effect oi X on y, not 
just the association. A causal effect is meant to rep- 
resent the effect that manipulations or interventions 
in X have on Y, as opposed to the mere observation 
of different X values. Hence we define the causal ef- 
fect formally as the effect of an intervention. Our 
approach goes back to the work of Spirtes, Glymour 
and Scheines (1993), Pearl (1993) and is detailed 
in the article by Dawid (2002) (see also Lauritzen, 
2000; Dawid and Didelez, 2010). We define an in- 
dicator ax for an intervention in X, where ax in- 
dicates either that X is being set to a value x ^ X 
in the domain of X, or that X arises naturally. In 
the former case we write ax = x, x € and in the 
latter ax = 0- More precisely, 

(5) p{x'\W;ax = x) = 5{x = x'}, 

where W can be any set of additional variables and 
5 is the indicator function. Hence X is indepen- 
dent of any other variable when ax = x. In con- 
trast, p{x'\W;ax = 0) is the conditional distribu- 
tion of X given W that we observe when no in- 
tervention takes place, that is, if X arises natu- 
rally. More generally one may be interested in other 
types of interventions, for example, where (5) is a 



probability or depends on W (Dawid and Didelez, 
2010; Didelez et al., 2006), but we do not consider 
these in more detail here. The above approach is re- 
lated to the potential outcomes framework (Rubin, 
1974, 1978; Robins, 1986), in that the distribution 
of the outcome Y under an intervention, p{y\ax = 
x), corresponds to the distribution of the poten- 
tial outcome Y^. A comparison of different causal 
frameworks can be found in the work of Didelez and 
Sheehan (2007b). We also call the situation ax = 
the observational regime and the situation ax = x, 
for some x ^ X , the experimental or interventional 
regime. 

4.1 Influence Diagrams and Causal DAGs 

The indicator ax must be regarded as a decision 
variable or parameter, not as a random variable and 
hence every statement about the system under in- 
vestigation must be made conditional on the value 
of ax- We will use conditional independence state- 
ments of the type "A is independent of ax given S," 
or A _LL ax\B, meaning that the conditional distri- 
bution of A given B is the same under observation 
and any setting of X. With this notion of conditional 
independence applied to the intervention indicator, 
we can then also include ax into our DAG represen- 
tation of a data situation in order to encode which 
variables are conditionally independent of ax in the 
above sense. As ax is not a random variable but a 
decision variable it is graphically represented in a 
box and the resulting DAG is called an influence di- 
agram (Dawid, 2002); cf. Figure 17 for an example. 

The following points are important when construct- 
ing an influence diagram. 

(1) As the decision to intervene in X immediately 
affects its distribution, ax has to be a graph parent 
of X, while ax itself has no parents as it is a decision 
node. 

(2) Hence, any variables that are nondescendants 
of ax are assumed independent of ax, that is, they 



HRT 




Fig. 17. Influence diagram for TCI example, prospectively. 
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are not affected by an intervention in X. Such vari- 
ables are often called "pre-treatment" or "baseline" 
covariates, such as age, gender, etc. Figure 17, for 
instance, encodes the assumption that the distribu- 
tion of age, occupation, smoking status and prior 
thromboembolic disorders does not change if HRT 
is manipulated, while TCI is potentially affected. 
Thus, by representing variables as nondescendants 
or descendants of ax we can explicitly distinguish 
between variables that are known a priori not to 
be affected by an intervention in X and those that 
are. It is therefore not sensible to add such an inter- 
vention node to a retrospective graph such as Fig- 
ure 13 as important prior knowledge about what is 
and is not potentially affected by an intervention 
in X could then not be represented. Retrospective 
graphs encode a different set of assumptions that 
can be used to justify collapsibility as illustrated in 
Section 3.5 in order to apply condition (i) of Corol- 
lary 5, for instance. 

(3) Finally, X being the only child of ax encodes 
the assumption that variables that are potentially 
affected by an intervention (i.e., descendants of X) 
are conditionally independent of ax given {X, pa(X)) 
Justification of this assumption requires us to makes 
the system "rich" enough, often by including unob- 
servable variables. Figure 17 assumes that TCI _L 
-L crHRT\{-A.ge^ Smo, HRT). This means that once we 
know age and smoking status of a person and, for ex- 
ample, that she is not taking HRT, then it does not 
matter in terms of predicting TCI whether this is by 
choice or for instance because HRT is banned from 
the market. This assumption has to be scrutinized 
with regard to the particular intervention that is 
considered and variables that are taken into account. 
If, for example, smoking status was unobserved and 
omitted from the graph, then the absence of an edge 
from aHRT to TCI in the new graph might not be 
justifiable as TCI ^ aHRT\{Age, HRT) if Figure 17 
is correct (see moral graph in Figure 19). We might 
even doubt the independence TCI IL aHRT\{Age, 
Smo, HRT) in Figure 17, for example, if it is thought 
that socioeconomic background predicts HRT and 
TCI in a way not captured by {Age, Smo}. 

With an influence diagram constructed as above, 
the distribution of all variables under an interven- 
tion ax = x' is given by (2) with the only modifi- 
cation that p{x\pa,{x)) is replaced by 5{x = x'} due 
to (5). This results in the well-known intervention 
formula an early version of which appears in the ar- 
ticle by Davis (1984) (see also Spirtes, Glymour and 
Scheines, 1993; Pearl, 1993). 



We want to stress that influence diagrams are 
more general than causal DAGs which have become 
a popular tool in epidemiology (Greenland et al., 
1999a). The assumptions underlying a causal DAG 
are equivalent to those represented in an influence 
diagram that has intervention nodes a^ and edges 
a^ ^ V for every node v (^V in the DAG. The ab- 
sence of directed edges from a^ to any other variable 
than V translates for a causal DAG to the require- 
ment that all common causes of any pair of variables 
have to be included in the graph. Hence, readers 
who are more familiar with causal DAGs can think 
of influence diagrams as causal DAGs (ignoring ax ) , 
but they are then making stronger assumptions. For 
a critical view on causal DAGs see the article by 
Dawid (2010). 

4.2 Population Causal Effect 

We give two definitions of causal effects that are 
relevant for the present article. They are in the spirit 
of similar definitions in the literature (Rubin, 1974; 
Robins, 1986; Pearl, 2000; Dawid, 2002). We for- 
mulate them first in terms of distribution and later 
specify particular causal parameters. 

A population causal effect is some contrast be- 
tween the post-intervention distributions p{y\ax = 
x), X € X, of y for different interventions, for ex- 
ample, setting X to xi as opposed to X2. One could 
say that this is a valid target of inference if we con- 
template administering a treatment to the whole 
population. Most radically one can say that X has 
a causal effect on Y if for some values xi ^ X2 € 
X the two distributions 'p{y\ax = Xi), i = 1,2, dif- 
fer in some aspect. If one can estimate these post- 
intervention distributions from observable data, then 
one can estimate any contrast between them. When 
'p{y\ax = x) ^ p{y\X = x; ax = 0) we say that the 
effect of X on y is (marginally) confounded.^ [Note 
that as detailed by Greenland, Pearl and Robins 
(1999b) it is important to treat confounding and 
noncollapsibility as distinct concepts.] We can ad- 
just for confounding if a set of variables C is ob- 
served satisfying the following conditions (6) and 



^Note that "reverse causation" can occur, when in fact Y 
is the cause of X, in which case we also have p{y\ax = x)^ 
= x\ax = 0). This is relevant in case-control studies, 
where it is not always ensured that X is prior to Y\ for exam- 
ple, when Y is coronary heart disease and X is homocysteine 
level, one might argue that existing atherosclerosis increases 
the homocysteine level. We do not consider reverse causation 
as confounding. 
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(7) [in short we call this a sufficient set of covari- 
ates (Dawid, 2002)]. Assume we know that Y _LL 
ax\{X, C), that is, 

p{y\C = c;ax =x) 

(6) 

= p{y\X = x,C = c;ax = 0), 

meaning that once we know C and the value x, then 
it does not make a difference whether X = x has 
been observed to happen by nature or by interven- 
tion. If in addition 

(7) ClLax, 

that is, the covariates C are pre-treatment, then the 
post-intervention distribution can be consistently es- 
timated from prospective data (provided C is ob- 
served). The post-intervention distribution for set- 
ting X to x is obtained as 

p{yWx = x) 

= '^Piy\c, x'; ax = x)p{x'\c; ax = x)p{c) 

x' ,c 

(8) 

= ^Piy\c, x'; ax = 0)6{x = x'}p{c) 

x',c 

= "^Pivlc, x; ax = 0)p{c), 

c 

where the last step is due to (5). The quantities 
p{y\c,x;ax = 0) and p{c) can be consistently esti- 
mated from prospective data on X,Y and C. As 
pointed out, for example, by Clayton (2002), (8) 
corresponds to classical direct standardization. The 
above conditions (6) and (7) are equivalent to Pearl's 
(1995, 2000) so-called back-door criterion for causal 
graphs (Lauritzen, 2000). If we cannot find a set of 
covariates that satisfies (6) and (7), an alternative 
is to use an instrumental variable, but we do not 
consider this any further here (see Angrist, Imbens 
and Rubin 1996; Didelez and Sheehan, 2007a). 

Example continued. Consider again Figure 17. 
We can see that C = {Age, Occ, Smo, THist} sat- 
isfies (6) and (7). But these properties are also sat- 
isfied for the smaller set C = {Age, Smo}. Age and 
Smo are independent of anRT, as can be seen from 
the moral graph in Figure 18, and together with 
HRT they separate Y and anRT as can be seen from 
the second moral graph in Figure 19. This implies 
that in a prospective study we can ignore Occ and 
THist altogether and apply (8) to obtain the post- 
intervention distribution. 



HRT 



Age 




Occ Smo 




THist 

Fig. 18. Moral graph on Age, Occ, Smo, THist and a for 
Figure 17. 

If, instead, we were to investigate the causal effect 
of smoking on TCI we might assume an influence 
diagram as in Figure 20 (ignoring 5). We can see by 
a similar reasoning that C = {Occ} is a sufficient set 
of covariates. Note that, in this case, the mediating 
variable HRT must not be included in C as it does 
not satisfy (7). This illustrates that the population 
causal effect that is identified by conditions (6) and 
(7) is an overall or total effect, for example, the effect 
of smoking on TCI as potentially mediated by its 
effect on HRT. 

As can be seen from (8), the population causal ef- 
fect depends on the distribution of C in the popula- 
tion; this is not always desirable as it may mean that 
we cannot carry forward the results to another pop- 
ulation. Hence we consider the conditional causal 
effect next. 

4.3 Conditional Causal Effects 

A conditional causal effect is some contrast be- 
tween the post-intervention distributions conditional 
on some covariates C, p{y\C;ax = x), x G X (for 
the moment C need not be the same as in (8), but 
we get back to this). Such a conditional causal ef- 
fect may be of interest if one wants to measure how 
effective treatment is for a particular patient with 
known characteristics such as gender, medical his- 
tory, etc. It therefore seems reasonable to assume 




Fig. 19. Moral graph on all variables for Figure 17. 
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Fig. 20. Influence diagram for TCI example with interven- 
tion in 'smoking. ' 

that these covariates C satisfy (7). We further as- 
sume that they also satisfy (6) because otherwise 
we would need to take additional suitable covariates 
into account in order to apply (8), so we might as 
well incorporate them immediately. Also, if C sat- 
isfies both properties, the conditional causal effect 
does not depend on the population distribution of 
covariates. With (6) and (7) the conditional post- 
intervention distribution is automatically identified 
if C is observed. Note that in order to obtain the 
population causal effect using (8) we can choose any 
set C such that (6) and (7) are satisfied, whereas 
when we consider the conditional causal effect C 
could include more variables, for example, because 
they are so-called effect modifiers. For example, in 
Figure 17 one may be interested in the conditional 
causal effect given Age, Smo and THist if the latter 
is thought to predict a different effect of HRT on 
TCI, even though it is not necessary to adjust for 
THist to obtain the population causal effect. 

As alluded to earlier, both the population but also 
the conditional causal effect are "total" causal ef- 
fects, when C satisfies (7), in the sense that they 
include direct as well as indirect effects of X on y; 
for example, the effect of smoking on TCI may be 
moderated by HRT. A detailed treatment of this 
topic is beyond the scope of this article but we re- 
fer to the works of Pearl (2001), Robins (2003) and 
Didelez, Dawid and Geneletti (2006) and Geneletti 
(2007) for the general theory, and conditions of iden- 
tifiability, of direct and indirect effects especially in 
the nonlinear case. 

4.4 Inference on Causal Effect 

We review testing for the causal effect based on 
prospective data. In the broadest sense, the causal 
null hypothesis is that the post-intervention distri- 
bution of Y, p{y\ax = x) (or possibly p{y\c; ax = x) 
if we consider the conditional causal effect), does not 



depend on the value x, that is, we do not change the 
distribution of Y by setting X to different values. It 
is clear from (8) that if there is no conditional causal 
effect, that is, if y _LL X\{C;ax = 0), then there is 
also no population causal effect. The converse is not 
necessarily true, in particular when there are differ- 
ent effects in different subgroups that may happen 
to cancel each other out such that there is no over- 
all effect in the whole population, that is, p{y\crx = 
x) is independent of x without Y _LL X\{C;ax = 
0) being true — this is known as lack of faithfulness 
(see Spirtes, Glymour and Scheines, 1993). Hence 
we suggest testing Y IL X\{C;ax = 0) in order to 
investigate the causal null hypothesis of no (condi- 
tional) causal effect. If this independence can be re- 
jected, then there is evidence for a conditional causal 
effect, and (except in rare cases of such lack of faith- 
fulness) for a population effect. 

For estimation, we need to define the causal pa- 
rameter of interest. Much of the causal literature 
is based on the difference in expectation, leading 
to the average population and average conditional 
causal effect, E{Y\ax = xi) — E{Y\ax = X2) (of- 
ten denoted by ACE) and E{Y\C = c;ax = xi) — 
E{Y\C = c; ax = X2), respectively. 

Here, we focus instead on population and condi- 
tional causal odds ratios as these are invariant to 
the marginal distributions and hence applicable un- 
der outcome-dependent sampling, as will be seen. 
Assume that Y and X are binary. The population 
causal odds ratio (COR) is defined as 

_ p{Y = l\ax = l)p{Y = 0\ax=0) 
p{Y = 0\ax = l)p{Y = l\ax=0)' 

Alternatively consider the conditional CORyxiC = 
c) where we condition on the set of covariates C = c, 
that is, 

CORyx{C = c) 

_ p{Y = l|c; ax = l)p{Y = 0|c; ax = 0) 
p{Y = 0|c; ax = l)p{Y = l|c; ax = 0) ' 

This is distinct from the population CORyx when it 
is not collapsible over C, just as for the associational 
odds ratio. When a set of covariates C is sufficient to 
adjust for confounding, that is, satisfies (6) and (7), 
then p(Y = l|c; ax = x) = p{Y = l\c,X = x; ax = 
0) and hence CORyx{C) = ORyx{C). This means 
we can use Corollary 3 in order to check whether 
C can be reduced, that is, whether ORyx{C) and 
hence CORyx{C) is collapsible over a subset of C. 
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Fig. 21. DAG representing simple case-control situation. 

4.5 Causal Inference in Case-Control Studies 

Now we include the sampling variable S in our 
considerations. Note that the targeted causal pa- 
rameters do not involve S, so we only want to make 
assumptions about the distribution of S under the 
observational regime ax = 0. In the simple situa- 
tion of control study (without matching) sam- 
pling is just on the values of Y . Therefore we assume 
that 

(9) SAL{C,X)\{Y;ax = 0) 

and in addition (6) and (7). These together imply 
the following factorization: 

p{y,c,x,s\ax = 0) 

(10) =p{s\y;ax = 0)p{y\c,x) 

X p{x\c; ax = 0)pic)- 

The influence diagram in Figure 21 represents slightly 
stronger restrictions, as it implies S _LL ctx|5^ which 
does not follow from (6), (7) and (9); that is, we 
do not specify any assumptions about the distribu- 
tion of S under ax = x as this is not relevant to 
the target of inference. We will nevertheless use in- 
fluence diagrams like Figure 21 to represent jointly 
our assumptions about the sampling process and the 
contemplated intervention. 

The data come from the distribution p{y, c, x\S = 
l;ax = 0), given by 

p{S = l\y)p{y\c,x)p{x\c;ax = 0)p{c) 
^ ' p{S = l\ax = 0) 

similar to (4). The moral graph in Figure 22 includes 
an edge between ax and C as the conditional dis- 
tribution of C given S is not the same for different 
regimes ax- Hence if individuals are selected based 
on their case or control status, we cannot expect the 
distribution of the covariates to be the same in a sce- 
nario where the risk factor X has been manipulated 
by external intervention as in a scenario where it has 
been left to arise naturally. 

The following theorem revisits the well-known re- 
sult that the causal effect of X on Y can be tested, 



and the causal odds ratio estimated, from case-control 
data (Breslow, 1996). The target of inference is 
CORyxiC), based on p{y\c; ax = x), which is prospec- 
tive in the sense that we want to predict the effect of 
manipulating X on y after knowing C without con- 
ditioning on S while we have only the retrospective 
information p{y\c, S = 1; ax = 0) available. The fol- 
lowing theorem allows S to depend on the covariates 
C as well as on Y . 

Theorem 7. Under (6), (7) and assuming 511 
X\{C,Y;ax = 0), we can (i) test the null hypothe- 
sis of no conditional causal effect of X on Y given 
C by testing X IL Y\{C,S = l;ax = 0) (regardless 
of the measurement scales), and (ii) consistently es- 
timate CORyx{C) by estimating 0Ryx{C,S = 1) 
(for categorical X,Y). 

Proof, (i) Earlier we argued that a test for 
Y _LL X\{C;a = 0) can replace a test of the null 
hypothesis of no causal effect when C satisfies (6) 
and (7). As S IL X\{C,Y; ax = 0), Theorem 6 com- 
pletes the argument. 

(ii) Assumptions (6) and (7) imply CORyx{C) = 
ORyx{C) as explained earlier. With Corollary 4 we 
see that ORyx {C, S) is collapsible over S when S _LL 
X\{C, Y; ax = 0), which completes the proof. □ 

In Theorem 7, as far as testing is concerned, we 
are not restricted to the categorical situation and 
can use as test statistic whatever seems appropriate 
given the measurement scales of X,Y,C. If this in- 
dependence is rejected, then there is evidence for a 
causal effect. In the particular case of binary Y and 
continuous X it is well known that we can still also 
consistently estimate the odds ratio using a logistic 
regression (Prentice and Pyke, 1979). Their result, 
however, relies on the logistic link being justified, 
while the results on odds ratios when X and Y are 
both categorical, such as Theorem 7(ii), make no 
parametric assumptions. 

The set C in Theorem 7 needs to contain a suf- 
ficient set of covariates so as to justify (6). But it 
also needs to contain any matching variables, even 
if these are not needed for (6), in order to justify 




Fig. 22. Moral graph for simple case-control situation. 
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^HRT tTCV 



Table 4 

Conditional causal odds ratio between Smo and TCI given 
(Age, THist) 




THist 



Fig. 23. Influence diagram for TCI example with matched 
sampling. 



S _LL X\{C, Y; ax = 0). This has been illustrated in 
Figure 8, with the variable B which is not needed 
to adjust for confounding. Hence, a sufficient set of 
covariates and the matching variables are required 
for Theorem 7 to work. However, typically a much 
larger set of covariates has been observed; one can 
then use Corollary 5 to reduce it without losing in- 
formation, as in the following example. 

Example continued . In the HRT- TCI exam- 
ple, as the study design was case-control matched by 
Age, we need to make sure that C contains Age. But 
we already saw that C = {Age, Smo} is a set of suffi- 
cient covariates. Hence, all assumptions of Theorem 
7 are satisfied with this choice of C (check these on 
the influence diagram in Figure 23). That is, we can 
consistently estimate the causal odds ratio between 
HRT and TCI given Age, Smo from the available 
data. 

Alternatively, if the target is the conditional causal 
odds ratio given all covariates, then we can see that 
with the choice of C = {Age, Occ, THist, Smo} the 
conditions of Theorem 7 are satisfied; we can es- 
timate the causal odds ratio HRT and TCI given 
C from the available data, but we can additionally 
omit Occ due to the conditional odds ratio being 
collapsible over this variable. Note that it is not fur- 
ther collapsible over the variable THist, implying 
that the causal odds ratio given C = {Age, Smo} 
is different from the causal odds ratio given C = 
{Age, Occ, THist, Smo}, though both conditioning 
sets are sufficient to adjust for confounding under 
our assumptions. As mentioned before, THist could 
be an effect modifier and might therefore be in- 
cluded. 

Assume now that we are instead interested in the 
effect of smoking {Smo) on TCI and that the as- 
sumptions encoded in Figure 20 are satisfied. So 
far we have targeted the conditional causal odds 



Smoking level 


log-OR (stdev) 


OR 


Never 


Reference 




Former 


0.41 (0.21) 


1.51 


1-10 


0.89 (0.19) 


2.43 


11-20 


0.97 (0.18) 


2.65 


21+ 


1.19 (0.41) 


3.29 



ratio between exposure and response given all co- 
variates; however, if we include the mediator HRT 
into C, then it does not satisfy condition (7) as it is 
a descendant of Smo. Hence we could consider C = 
{Age, Occ, THist} and find that ORsmo,TCi{C,S) 
can again be collapsed over Occ. The resulting esti- 
mates are shown in Table 4. Note that these describe 
the "total" effect of Smo on TCI including possible 
mediation via HRT (but conditional on Age and 
THist). 

5. EXTENSIONS AND FURTHER EXAMPLES 

In this section we consider more general data situ- 
ations where the sampling depends in a less obvious 
way on the outcome and possibly on further vari- 
ables. In particular, we extend the previous results 
to the case where a sufficient set of covariates (and 
possibly matching variables) C does not allow us 
to collapse over S. In such cases taking further vari- 
ables into account can sometimes provide a solution. 
Let us start with an example. 

Example. Weinberg, Baird and Rowland (1993) 
and Slama et al. (2006) considered 'time-to-pregnancy' 
studies which are of interest when investigating fac- 
tors affecting fertility. Typically X is exposure, such 
as a toxic substance or smoking, and Y is the time 
to pregnancy; common covariates C such as age, so- 
cioeconomic background, etc., may be taken into ac- 
count. The problem here is that if women are sam- 
pled who became pregnant during a certain time in- 
terval (retrospective sampling), then long duration 
to pregnancy automatically means earlier initiation 
time. However, initiation time might predict the ex- 
posure if it has changed over time, for example, be- 
cause precautions regarding toxic substances have 
increased or smoking habits in the population have 
changed over time. Therefore, Y and X may be as- 
sociated given C even if there is no causal effect 
of exposure, that is, C is not "bias-breaking." Note 
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Fig. 24. Graphical representation of assumption in time- 
to-pregnancy example. 



that the same phenomenon also occurs with current 
duration designs and that prospective samphng has 
many other drawbacks in "time-to-pregnancy" stud- 
ies as discussed in detail by Slama et al. (2006). 

The key to solving this problem is to find a bias- 
breaking variable Z such that either X or y can rea- 
sonably be assumed conditionally independent of S 
given Z (and observed covariates) and to use Corol- 
lary 3 so as to further collapse over Z. The method 
proposed by Weinberg, Baird and Rowland (1993) 
relies on using the time of initiation. It seems plausi- 
ble that once the initiation time Z and time to preg- 
nancy Y are known, the sampling S is not further as- 
sociated with the exposure X, typically controlling 
for relevant covariates C, that is, S IL X\{Z,Y,C). 
Further, we may sometimes be able to justify that 
Y _LL Z\{X, C), that is, that the initiation time itself, 
once we account for relevant factors and regardless 
of whether the unit is sampled or not, should not 
predict time to pregnancy. This assumption might 
be violated if there are other relevant factors that 
have changed over time and that are not captured 
by C or X. 

Using again ax as intervention indicator and as- 
suming that C is a sufficient set of covariates, we 
can summarize our assumptions about the time-to- 
pregnancy example through the following indepen- 
dencies: Z ALax, Y IL {Z,ax)\iX,C) and S AL 
{X, crx)\ (Z, Y, C). The graph in Figure 24 represents 
these conditional independence assumptions (the edge 
C ^ Z could be replaced by C Z). If it were not 
for the retrospective sampling, the causal effect of X 
on Y could be analyzed ignoring Z, as C is assumed 
a sufficient set of covariates. The selection effect be- 
comes apparent when checking for graph separation, 
which yields a moral edge between Z and Y when 
conditioning on S (cf. moral graph in Figure 26). 

The following theorem shows that exploiting the 
initiation time Z in the above example can indeed 
facilitate inference about the causal effect. 



Fig. 25. Moral graph for DAG in Figure 24 marginal over S. 

Theorem 8. We can test for a (conditional) 
causal effect of X on Y ( given C) if there exists a 
set of observable variables Z such that all following 
conditions are satisfied: 

(i) SlLX\{Y,Z,C;ax = 0), 

(ii) Y ALZ\{X,C;ax = 0), 

(iii) C is a sufficient set of covariates, 

(iv) the joint distribution p{y, x, z\ax = 0) is stri- 
ctly positive. 

The causal null hypothesis is then equivalent to Y 1. 
±X\{Z, C,S = l;ax = 0). 

Proof. As argued before, testing Y IL X\{C; 
ax = ) provides a test for the causal null hypothe- 
sis. We show that it is equivalent to y _LL X\{Z, C,S = 
l;ax = 0). As all conditional independencies are 
conditional on ax = it will be omitted from the 
notation. 

Remember that with (i) and Theorem 6, y _LL 
X\{Z, C) is equivalent to y ^ X\{Z, C,S = 1). First 
assume that Y IL X\C. With (ii) we obtain Y _LL 
X\{Z, C), which is equivalent to Y AL X\{Z,C, S = 
1). For the converse, assume Y AL X\{Z,C,S = 1), 
hence Y AL X\{Z,C). Now, (iv) is sufficient to en- 
sure (Lauritzen, 1996, page 29) that this conditional 
independence together with (ii) yields Y AL {X, Z) \ C 
which implies y ±L X|C. □ 

Example continued. Consider again the time- 
to-pregnancy study represented in Figure 24. We see 
that all assumptions of Theorem 8 are satisfied: C 
and a are nondescendants of each other and have no 




Fig. 26. Moral graphs for the DAG in Figure 24 including S. 
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parents; from the moral graph on Y, Z, X, C (Figure 
25) it fohows that C is a sufficient set of covariates, 
and that (ii) holds, while part (i) can be seen from 
the moral graph on all nodes in Figure 26. Hence 
we can investigate the null hypothesis of no causal 
effect by testing whether Y and X are associated 
conditionally on Z and C for the sampled subjects. 

Remarks, (a) Theorem 8 is symmetric in X, Y, 
in the sense that they can be swapped in (i) and (ii). 

(b) If Z = 0, the conditions are the same as for the 
matched case-control situation of Section 4.5 (The- 
orem 7). 

(c) Further, the theorem does not require that 
Z IL ax, that is, that the bias-breaking variable Z is 
not affected by an intervention in X, hence in Figure 
24 the arrow from Z to X could be reversed (though 
this is not plausible in the time-to-pregnancy sce- 
nario that we have used, but may be relevant in 
other scenarios). 

It is easy to find realistic examples where the as- 
sumptions of Theorem 8 are not satisfied; for ex- 
ample, Robins (2001) explained why it is difficult to 
identify the effect of hormone treatment on endome- 
trial cancer in case-control studies. In such situa- 
tions, additional outside information can sometimes 
be used to obtain identifiability (see, e.g., Geneletti, 
Richardson and Best, 2009). 

In addition to the above result about testing the 
causal effect, we also have the following about esti- 
mating it when all variables are discrete. 

Theorem 9. Under the assumptions of Theo- 
rem 8, we can consistently estimate the (conditional) 
causal odds ratio of X on Y given C, CORyxiC) , 
by estimating ORyxiZjC, S = 1). 

Proof. From (iii) it follows that CORyxiC) = 
ORyx{C), that is, the causal odds ratio is equal to 
the observational odds ratio. Further, using Corol- 
lary 3 with Bi = S and B2 = Z, (i) yields ORyxiZ, 
C, S) is collapsible over S in the observational regime 
and (ii) means ORyxiZ, C) is collapsible over Z, 
hence ORyxiZ, C, S) is collapsible over iZ,S). □ 

Remarks, (a) In the situation of Theorem 9 it 
does not necessarily hold that ORyxiC, S = 1) = 
ORyxiC), that is. Corollary 5 (with B = Z) does 
not apply as neither part (i) nor part (ii) of that 
corollary is satisfied. However, Corollary 5 can be 
used to further reduce the set C if ORyxiC) is col- 
lapsible over a subset of C. The difference between 




(c) (d) 



Fig. 27. (a) DAG and relevant moral graphs for Hernan, 
Hemdndez-Diaz and Robins (2004) example; (b) is moral 
graph on all variables, (c) on C and ax, (d) on Y,X,C and 
ax- 

the above Theorem 9 and Corollary 5 is that while 
the latter focuses on such a reduction of dimension- 
ality, the former exploits the fact that ORyxiC) is 
the same in a higher dimensional model ORyxiC, 
Z,S = 1). This is useful because we can only 
estimate quantities conditional on (S* = 1 while 
ORyxiC)^ ORyxiC, S = l). 

(b) Theorem 9 implies that the assumptions of 
Theorem 8 can be tested to a certain extent as they 
imply that ORyxiZ = z,C,S = l) = ORyxiZ = z', 
C,S = 1) for z z' . Hence, estimates should not 
vary much for different values of Z. 

We conclude with an example for potential se- 
lection bias that is not due to outcome-dependent 
sampling but is also covered by the conditions of 
Theorem 9 due to their symmetry in X and Y. 

Example. Hernan, Hernandez-Diaz and Robins 
(2004) considered the example illustrated in Figure 
27. In a study with HIV patients X is anti-retroviral 
therapy, Y is AIDS, U is the true level of immuno- 
supression and C is a collection of symptoms as well 
as measurements on CD4 counts. Further covariates 
would typically be included but for simplicity we 
omit them here. We assume X is randomized so it 
has no graph parents. The fact that S depends on 
C and X represents that patients with worse symp- 
toms and side effects, predicted by treatment and 
baseline covariates, are more likely to drop out and 
not be available for the analysis. (Note that if it was 
not for having to condition on S = 1, then we could 
estimate the population causal effect of X on K 
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without further adjustment.) We can verify that the 
odds ratio between X and Y is not collapsible over 
S from Figure 27(b): neither X IL S nor y _LL S*. 
However, if we consider the conditional causal ef- 
fect of X on y given C, then we can collapse over 
S. With Z = 0, all conditions of Theorem 9 with 
X and Y interchanged are satisfied, so we can esti- 
mate CORyx{C) by using CORyx{C,S = !)■ con- 
dition (i) can be seen from the moral graph in Figure 
27(b), condition (ii) is redundant, condition (iii) can 
be seen from Figure 27(c) and (d). 

6. CONCLUSION 

As the sampling or selection mechanism can often 
create complications and bias in statistical analyses, 
we argued in Section 2.4 that the basic assumptions 
about the sampling, in terms of conditional inde- 
pendence, should be made explicit using graphical 
models including a node for the binary sampling 
indicator. We demonstrated how this allows us to 
characterize, with simple graphical rules, situations 
in which we can collapse the (conditional) odds ra- 
tio over S (Corollary 4) or, more generally, when we 
can test for a (conditional) association (Theorem 6). 
Addressing specifically causal inference. Theorem 7 
specifies the additional assumptions required to test 
for a causal effect or estimate a (conditional) causal 
odds ratio under outcome-dependent sampling, such 
as in a matched case-control design. Theorems 8 and 
9 extend these results to more general situations 
with less obvious outcome-dependent sampling. Our 
results are therefore relevant to a range of study de- 
signs, case-control being the most common, but also, 
for example, retrospective sampling that is condi- 
tional on reaching a certain state, such as time-to- 
pregnancy studies. 

We have shown how different types of graphical 
models can be used to express assumptions about 
the sampling process, admitting more flexibility than 
if restricted to causal DAGs (but as explained in Sec- 
tion 4.1 our results are also valid for the latter). In 
addition to directed acyclic and undirected graphs, 
we want to point out that chain graphs provide a 
further class of useful models. The original analy- 
sis of the TCI data, for instance, used chain graphs 
(Pedersen et al., 1997). The causal interpretation of 
chain graphs, however, is more complicated than for 
DAGs (cf. Lauritzen and Richardson, 2002). 

As any type of graph only encodes presence or ab- 
sence of conditional independencies, it cannot rep- 



resent particular parametric assumptions or prop- 
erties of the model and selection process. Conse- 
quently, any inference other than testing or estimat- 
ing odds ratios will typically require such additional 
assumptions, which in turn will need to be scruti- 
nized and complemented by a sensitivity analysis. 
We therefore regard the use of graphical models in 
this context as an important first step of the analy- 
sis, facilitating the structuring and reasoning about 
the problem of outcome-dependent sampling. 

Concerning the question of causal inference, we 
have mainly assumed an approach of adjusting for 
confounding by conditioning on suitable covariates 
in the analysis. A different way of using covariates 
is via the propensity score (Rosenbaum and Rubin, 
1983) or inverse probability weighting (Robins, Her- 
nan and Brumback, 2000), but little is known as yet 
on how to adapt these to case-control studies or gen- 
eral outcome-dependent sampling; but see the work 
of Robins, Rotnitzky and Zhao (1994, Section 6.3), 
Newman (2006), Mansson et al. (2007) and van der 
Laan (2008). 
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